期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

31.

When Does Scale Anchoring Work? A Case Study

Sandip Sinharay Shelby J. Haberman Yi‐Hsuan Lee 《Journal of Educational Measurement》2011,48(1):61-80

Providing information to test takers and test score users about the abilities of test takers at different score levels has been a persistent problem in educational and psychological measurement. Scale anchoring, a technique which describes what students at different points on a score scale know and can do, is a tool to provide such information. Scale anchoring for a test involves a substantial amount of work, both by the statistical analysts and test developers involved with the test. In addition, scale anchoring involves considerable use of subjective judgment, so its conclusions may be questionable. We describe statistical procedures that can be used to determine if scale anchoring is likely to be successful for a test. If these procedures indicate that scale anchoring is unlikely to be successful, then there is little reason to perform a detailed scale anchoring study. The procedures are applied to several data sets from a teachers’ licensing test. 相似文献

32.

Test Score Equating Using a Mini‐Version Anchor and a Midi Anchor: A Case Study Using SAT® Data

Jinghua Liu Sandip Sinharay Paul W. Holland Edward Curley Miriam Feigenbaum 《Journal of Educational Measurement》2011,48(4):361-379

This study explores an anchor that is different from the traditional miniature anchor in test score equating. In contrast to a traditional “mini” anchor that has the same spread of item difficulties as the tests to be equated, the studied anchor, referred to as a “midi” anchor (Sinharay & Holland), has a smaller spread of item difficulties than the tests to be equated. Both anchors were administered in an operational SAT administration and the impact of anchor type on equating was evaluated with respect to systematic error or equating bias. Contradicting the popular belief that the mini anchor is best, the results showed that the mini anchor does not always produce more accurate equating functions than the midi anchor; the midi anchor was found to perform as well as or even better than the mini anchor. Because testing programs usually have more middle difficulty items and few very hard or very easy items, midi external anchors are operationally easier to build. Therefore, the results of our study provide evidence in favor of the midi anchor, the use of which will lead to cost saving with no reduction in equating quality. 相似文献

33.

Assessing Individual‐Level Impact of Interruptions During Online Testing

下载免费PDF全文

Sandip Sinharay Ping Wan Seung W. Choi Dong‐In Kim 《Journal of Educational Measurement》2015,52(1):80-105

With an increase in the number of online tests, the number of interruptions during testing due to unexpected technical issues seems to be on the rise. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. Researchers such as Hill and Sinharay et al. examined the impact of interruptions at an aggregate level. However, there is a lack of research on the assessment of impact of interruptions at an individual level. We attempt to fill that void. We suggest four methodological approaches, primarily based on statistical hypothesis testing, linear regression, and item response theory, which can provide evidence on the individual‐level impact of interruptions. We perform a realistic simulation study to compare the Type I error rate and power of the suggested approaches. We then apply the approaches to data from the 2013 Indiana Statewide Testing for Educational Progress‐Plus (ISTEP+) test that experienced interruptions. 相似文献

34.

Is It Necessary to Make Anchor Tests Mini‐Versions of the Tests Being Equated or Can Some Restrictions Be Relaxed?

Sandip Sinharay Paul W. Holland 《Journal of Educational Measurement》2007,44(3):249-275

It is a widely held belief that anchor tests should be miniature versions (i.e., minitests), with respect to content and statistical characteristics, of the tests being equated. This article examines the foundations for this belief regarding statistical characteristics. It examines the requirement of statistical representativeness of anchor tests that are content representative. The equating performance of several types of anchor tests, including those having statistical characteristics that differ from those of the tests being equated, is examined through several simulation studies and a real data example. Anchor tests with a spread of item difficulties less than that of a total test seem to perform as well as a minitest with respect to equating bias and equating standard error. Hence, the results demonstrate that requiring an anchor test to mimic the statistical characteristics of the total test may be too restrictive and need not be optimal. As a side benefit, this article also provides a comparison of the equating performance of post-stratification equating and chain equipercentile equating. 相似文献

35.

Are the Nonparametric Person-Fit Statistics More Powerful Than Their Parametric Counterparts? Revisiting the Simulations in Karabatsos (2003)

Sandip Sinharay 《教育实用测度》2017,30(4):314-328

Karabatsos compared the power of 36 person-fit statistics using receiver operating characteristics curves and found the H^T statistic to be the most powerful in identifying aberrant examinees. He found three statistics, C, MCI, and U3, to be the next most powerful. These four statistics, all of which are nonparametric, were found to perform considerably better than each of 25 parametric person-fit statistics. Dimitrov and Smith replicated part of this finding in a similar study. The present article raises some issues with the comparisons performed in Karabatsos and Dimitrov and Smith and points to literature that suggests that the comparisons could have been performed in a more traditional and more fair manner. The present article then replicates the simulations of Karabatsos and demonstrates in several ways that the parametric person-fit statistics l_z and ECI4_z (that were also considered by Karabatsos) are as powerful as are H^T and U3 in identifying aberrant examinees in more traditional and fair comparisons. Two parametric person-fit statistics are shown to lead to similar results as H^T and U3 in a real data example. 相似文献

36.

Prediction of Essay Scores From Writing Process and Product Features Using Data Mining Methods

Sandip Sinharay Mo Zhang Paul Deane 《教育实用测度》2019,32(2):116-137

Analysis of keystroke logging data is of increasing interest, as evident from a substantial amount of recent research on the topic. Some of the research on keystroke logging data has focused on the prediction of essay scores from keystroke logging features, but linear regression is the only prediction method that has been used in this research. Data mining methods such as boosting and random forests have been found to improve over traditional prediction methods such as linear regression in various scientific fields, but have not been used in the prediction of essay scores from keystroke logging features. This article first provides a review of boosting, which is a popular data mining method. The article then applies boosting to predict essay scores from a large number of keystroke logging features and other predictor variables from two real data sets. 相似文献

37.

Primula munroi的产自东喜马拉雅的一个新亚种——P. munroi ssp. schizocalyx

Sandip Kumar Basak Gour Gopal Maiti 《中国科学院研究生院学报》2001,39(5):473-476

相似文献

38.

Use of Data Mining Methods to Detect Test Fraud

Kaiwen Man Jeffrey R. Harring Sandip Sinharay 《Journal of Educational Measurement》2019,56(2):251-279

Data mining methods have drawn considerable attention across diverse scientific fields. However, few applications could be found in the areas of psychological and educational measurement, and particularly pertinent to this article, in test security research. In this study, various data mining methods for detecting cheating behaviors on large‐scale assessments are explored as an alternative to the traditional methods including person‐fit statistics and similarity analysis. A common data set from the Handbook of Quantitative Methods for Detecting Cheating on Tests (Cizek & Wollack) was used for comparing the performance of the different methods. The results indicated that the use of data mining methods may combine multiple sources of information about test takers' performance, which may lead to higher detection rate over traditional item response and response time methods. Several recommendations, all based on our findings, are provided to practitioners. 相似文献

39.

A New Statistic for Detection of Aberrant Answer Changes

下载免费PDF全文

Sandip Sinharay Minh Q. Duong Scott W. Wood 《Journal of Educational Measurement》2017,54(2):200-217

As noted by Fremer and Olson, analysis of answer changes is often used to investigate testing irregularities because the analysis is readily performed and has proven its value in practice. Researchers such as Belov, Sinharay and Johnson, van der Linden and Jeon, van der Linden and Lewis, and Wollack, Cohen, and Eckerly have suggested several statistics for detection of aberrant answer changes. This article suggests a new statistic that is based on the likelihood ratio test. An advantage of the new statistic is that it follows the standard normal distribution under the null hypothesis of no aberrant answer changes. It is demonstrated in a detailed simulation study that the Type I error rate of the new statistic is very close to the nominal level and the power of the new statistic is satisfactory in comparison to those of several existing statistics for detecting aberrant answer changes. The new statistic and several existing statistics were shown to provide useful information for a real data set. Given the increasing interest in analysis of answer changes, the new statistic promises to be useful to measurement practitioners. 相似文献

40.

Antioxidant activity of ethanol extract of rhizome ofPicrorhiza kurroa on indomethacin induced gastric ulcer during healing

Arun Ray Susri Ray Chaudhuri Biswajit Majumdar Sandip K. Bandyopadhyay 《Indian journal of clinical biochemistry : IJCB》2002,17(2):44-51

Oral administration of ethanol extract of the rhizome ofPirorhiza kurroa at a dose of 20mg/kg body weight, for 10 consecutive days, was found to enhance the rate of healing on Indomethacin-induced gastric ulcer in rats, compared to the ulcerated group without treatment. The level of peroxidised lipid, in terms of thiobarbituric acid reactive species (TBARS), in gastric tissue, was increased in ulcerated rats which was restored to near normalcy on treatment with ethanol extract. The specific activity ofin vivo antioxidant enzymes, viz SOD and catalase and total tissue sulfhydryl (thiol) group, which were markedly decreased in ulcerated group, were found to be significantly elevated (p<0.05), on treatment with the above extract, at the specified dose, compared to the indomethacin—induced ulcerated group without any supporting treatment. The present study thus suggests that the ethanol extract of rhizome ofPicrorhiza kurroa, at the dose of 20mg/kg body weight, accelerated the healing of stomach wall of indomethacin induced gastric ulcerated rats by anin vivo free radical scavenging action. 相似文献